statistical dependence
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)
Sliced Mutual Information: A Scalable Measure of Statistical Dependence
Mutual information (MI) is a fundamental measure of statistical dependence, with a myriad of applications to information theory, statistics, and machine learning. While it possesses many desirable structural properties, the estimation of high-dimensional MI from samples suffers from the curse of dimensionality. Motivated by statistical scalability to high dimensions, this paper proposes sliced MI (SMI) as a surrogate measure of dependence. SMI is defined as an average of MI terms between one-dimensional random projections. We show that it preserves many of the structural properties of classic MI, while gaining scalable computation and efficient estimation from samples. Furthermore, and in contrast to classic MI, SMI can grow as a result of deterministic transformations. This enables leveraging SMI for feature extraction by optimizing it over processing functions of raw data to identify useful representations thereof. Our theory is supported by numerical studies of independence testing and feature extraction, which demonstrate the potential gains SMI offers over classic MI for high-dimensional inference.
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)
Adapting HFMCA to Graph Data: Self-Supervised Learning for Generalizable fMRI Representations
Frac, Jakub, Schmatz, Alexander, Li, Qiang, Van Wingen, Guido, Yu, Shujian
Functional magnetic resonance imaging (fMRI) analysis faces significant challenges due to limited dataset sizes and domain variability between studies. Traditional self-supervised learning methods inspired by computer vision often rely on positive and negative sample pairs, which can be problematic for neuroimaging data where defining appropriate contrasts is non-trivial. We propose adapting a recently developed Hierarchical Functional Maximal Correlation Algorithm (HFMCA) to graph-structured fMRI data, providing a theoretically grounded approach that measures statistical dependence via density ratio decomposition in a reproducing kernel Hilbert space (RKHS),and applies HFMCA-based pretraining to learn robust and generalizable representations. Evaluations across five neuroimaging datasets demonstrate that our adapted method produces competitive embeddings for various classification tasks and enables effective knowledge transfer to unseen datasets. Codebase and supplementary material can be found here: https://github.com/fr30/mri-eigenencoder
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Health Care Technology (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Measuring Dependencies between Biological Signals with Self-supervision, and its Limitations
Sariyanidi, Evangelos, Herrington, John D., Yankowitz, Lisa, Chaudhari, Pratik, Satterthwaite, Theodore D., Zampella, Casey J., Schultz, Robert T., Shinohara, Russell T., Tunc, Birkan
Measuring the statistical dependence between observed signals is a primary tool for scientific discovery. However, biological systems often exhibit complex non-linear interactions that currently cannot be captured without a priori knowledge regarding the nature of dependence. We introduce a self-supervised approach, concurrence, which is inspired by the observation that if two signals are dependent, then one should be able to distinguish between temporally aligned vs. misaligned segments extracted from them. Experiments with fMRI, physiological and behavioral signals show that, to our knowledge, concurrence is the first approach that can expose relationships across such a wide spectrum of signals and extract scientifically relevant differences without ad-hoc parameter tuning or reliance on a priori information, providing a potent tool for scientific discoveries across fields. However, dependencies caused by extraneous factors remain an open problem, thus researchers should validate that exposed relationships truly pertain to the question(s) of interest.
- North America > United States > Pennsylvania (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Diagnostic Medicine (0.69)
- Health & Medicine > Health Care Technology (0.68)
Sliced Mutual Information: A Scalable Measure of Statistical Dependence
Mutual information (MI) is a fundamental measure of statistical dependence, with a myriad of applications to information theory, statistics, and machine learning. While it possesses many desirable structural properties, the estimation of high-dimensional MI from samples suffers from the curse of dimensionality. Motivated by statistical scalability to high dimensions, this paper proposes sliced MI (SMI) as a surrogate measure of dependence. SMI is defined as an average of MI terms between one-dimensional random projections. We show that it preserves many of the structural properties of classic MI, while gaining scalable computation and efficient estimation from samples.
Learning Cortico-Muscular Dependence through Orthonormal Decomposition of Density Ratios
Ma, Shihan, Hu, Bo, Jia, Tianyu, Clarke, Alexander Kenneth, Zicher, Blanka, Caillet, Arnault H., Farina, Dario, Principe, Jose C.
The cortico-spinal neural pathway is fundamental for motor control and movement execution, and in humans it is typically studied using concurrent electroencephalography (EEG) and electromyography (EMG) recordings. However, current approaches for capturing high-level and contextual connectivity between these recordings have important limitations. Here, we present a novel application of statistical dependence estimators based on orthonormal decomposition of density ratios to model the relationship between cortical and muscle oscillations. Our method extends from traditional scalar-valued measures by learning eigenvalues, eigenfunctions, and projection spaces of density ratios from realizations of the signal, addressing the interpretability, scalability, and local temporal dependence of cortico-muscular connectivity. We experimentally demonstrate that eigenfunctions learned from cortico-muscular connectivity can accurately classify movements and subjects. Moreover, they reveal channel and temporal dependencies that confirm the activation of specific EEG channels during movement. Our code is available at https://github.com/bohu615/corticomuscular-eigen-encoder.
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)
- Information Technology > Data Science (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Conditional Dependence via U-Statistics Pruning
de Cabrera, Ferran, Vilà-Insa, Marc, Riba, Jaume
The problem of measuring conditional dependence between two random phenomena arises when a third one (a confounder) has a potential influence on the amount of information shared by the original pair. A typical issue in this challenging problem is the inversion of ill-conditioned autocorrelation matrices. This paper presents a novel measure of conditional dependence based on the use of incomplete unbiased statistics of degree two, which allows to re-interpret independence as uncorrelatedness on a finite-dimensional feature space. This formulation enables to prune data according to the observations of the confounder itself, thus avoiding matrix inversions altogether. Moreover, the proposed approach is articulated as an extension of the Hilbert-Schmidt independence criterion, which becomes expressible through kernels that operate on 4-tuples of data.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > New Jersey > Hudson County > Hoboken (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- (3 more...)
A Geometric Framework for Neural Feature Learning
Xu, Xiangxiang, Zheng, Lizhong
We present a novel framework for learning system design based on neural feature extractors by exploiting geometric structures in feature spaces. First, we introduce the feature geometry, which unifies statistical dependence and features in the same functional space with geometric structures. By applying the feature geometry, we formulate each learning problem as solving the optimal feature approximation of the dependence component specified by the learning setting. We propose a nesting technique for designing learning algorithms to learn the optimal features from data samples, which can be applied to off-the-shelf network architectures and optimizers. To demonstrate the application of the nesting technique, we further discuss multivariate learning problems, including conditioned inference and multimodal learning, where we present the optimal features and reveal their connections to classical approaches.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > California > San Diego County > San Diego (0.04)
- Asia > Taiwan > Taiwan Province > Taipei (0.04)
- Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.04)
The Cross Density Kernel Function: A Novel Framework to Quantify Statistical Dependence for Random Processes
This paper proposes a novel multivariate definition of statistical dependence using a functional methodology inspired by Alfred R\'enyi. We define a new symmetric and self-adjoint cross density kernel through a recursive bidirectional statistical mapping between conditional densities of continuous random processes, which estimates their statistical dependence. Therefore, the kernel eigenspectrum is proposed as a new multivariate statistical dependence measure, and the formulation requires fewer assumptions about the data generation model than current methods. The measure can also be estimated from realizations. The proposed functional maximum correlation algorithm (FMCA) is applied to a learning architecture with two multivariate neural networks. The FMCA optimal solution is an equilibrium point that estimates the eigenspectrum of the cross density kernel. Preliminary results with synthetic data and medium size image datasets corroborate the theory. Four different strategies of applying the cross density kernel are thoroughly discussed and implemented to show the versatility and stability of the methodology, and it transcends supervised learning. When two random processes are high-dimensional real-world images and white uniform noise, respectively, the algorithm learns a factorial code i.e., the occurrence of a code guarantees that a certain input in the training set was present, which is quite important for feature learning.
- North America > United States > Florida > Alachua County > Gainesville (0.14)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.45)